flowchart LR
A[("Raw reads")] --> B{"fastp"}
B --> C{"FastQC"} & E{"STAR"}
C --> D{"MultiQC"}
E --> F{"HTSeq"}
F --> G[("Count table")]
G --> I{"DESeq2"} & Z{"limma"} & W{"PCA"}
Z --> H{"PCA"} & J{"BioNERO"}
I --> K{"GSEA"}
J --> L{"ORA"} & N{"STRING"}
G --> V{"GSVA"}
Material and methods
RNA expraction and sequencing
Total RNA was isolated using the RNeasy Plant Mini Kit (QIAGEN, Germany) following the manufacturer’s protocol. To eliminate potential DNA contamination, the RNA was further treated with the TURBO DNA-free Kit (Thermo Fisher Scientific, USA) in a 50 µL reaction volume. Purification was performed using Agencourt RNA Clean XP beads (Beckman Coulter, USA). RNA concentration and integrity were evaluated using the Quantitative RiboGreen RNA Assay (Thermo Fisher Scientific) and the RNA 6000 Pico Kit (Agilent Technologies, USA). RNA libraries were constructed using the NEBNext Poly(A) mRNA Magnetic Isolation Module followed by the KAPA RNA Hyper Kit (Roche, Switzerland), adhering to the manufacturer’s instructions. Final library purification was carried out with KAPA HyperPure Beads (Roche, Switzerland). Library fragment size distribution and purity were assessed using the High Sensitivity DNA Kit (Agilent Technologies). Quantification was performed with the Quant-iT DNA Assay Kit, High Sensitivity (Thermo Fisher Scientific). Equimolar pools of each library (10 pM) were subjected to high-throughput sequencing on the Illumina HiSeq 2500 platform using paired-end reads (2 × 100 bp) with a 2% PhiX spike-in control.
Bioinformatics analysis
Initial quality assessment of murine transcriptomic data was performed using FastQC (v0.12.1), with aggregated reports generated via MultiQC (v1.27.1) [Ewels et al., 2016]. Raw reads were processed using fastp (v0.23.4) to remove low-quality sequences [Chen et al., 2018]. Read alignment was subsequently conducted with STAR (v2.7.11) [Dobin et al., 2013] against the Mus musculus reference genome (GENCODE GRCm39.vM36) using the annotation file gencode.vM36.chr_patch_hapl_scaff.annotation.gtf, followed by gene-level quantification with HTSeq-count (v2.0.5) [Anders et al., 2015]. Differential gene expression analysis between experimental groups was performed using DESeq2 (v1.44.0) [Love et al., 2014]. For comprehensive functional interpretation, we implemented gene-set enrichment analysis (GSEA) through fgsea (v1.31.6) [Korotkevich et al., 2016] and clusterProfiler (v4.12.6) [Wu et al., 2021], utilizing both the MSigDB [Castanza et al., 2023] and CellMarker 2.0 [Hu et al., 2023] databases. Immune cell enrichment patterns were further assessed via Gene Set Variation Analysis (GSVA) [Hänzelmann et al., 2013]. Immune cell populations were deconvoluted from tumor transcriptome profiles using the mMCP-counter algorithm [Petitprez et al., 2020]. Co-expression network analysis was performed using BioNERO (v1.12.0) [Almeida-Silva et al., 2022], employing signed hybrid network topology with Pearson correlation metrics. Potential batch effects were addressed using the removeBatchEffect function from the limma package [Ritchie et al., 2015]. Functional annotation of co-expressed gene modules included over-representation analysis against multiple pathway databases including MSigDB, KEGG [Kanehisa et al., 2025], Reactome [Milacic et al., 2024], and WikiPathways [Agrawal et al., 2024]. Protein-protein interaction networks were reconstructed using the STRING database v12.0 through the STRINGdb (v2.16.4) Bioconductor package [Szklarczyk et al., 2023]. Visualization of analytical results was achieved using a combination of pheatmap (v1.0.12) for heatmap generation, ggplot2 (v3.5.1) for general plotting, and EnhancedVolcano (v1.22.0) for specialized visualization of differential expression results. The entire analytical workflow was implemented in R (v4.4.2).
References
- Ewels, Philip, et al. “MultiQC: summarize analysis results for multiple tools and samples in a single report.” Bioinformatics 32.19 (2016): 3047-3048.
- Chen, Shifu, et al. “fastp: an ultra-fast all-in-one FASTQ preprocessor.” Bioinformatics 34.17 (2018): i884-i890.
- Dobin, Alexander, et al. “STAR: ultrafast universal RNA-seq aligner.” Bioinformatics 29.1 (2013): 15-21.
- Anders, Simon, Paul Theodor Pyl, and Wolfgang Huber. “HTSeq—a Python framework to work with high-throughput sequencing data.” Bioinformatics 31.2 (2015): 166-169
- Love, Michael I., Wolfgang Huber, and Simon Anders. “Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2.” Genome biology 15 (2014): 1-21.
- Hu, Congxue, et al. “CellMarker 2.0: an updated database of manually curated cell markers in human/mouse and web tools based on scRNA-seq data.” Nucleic acids research 51.D1 (2023): D870-D876.
- Hänzelmann, Sonja, Robert Castelo, and Justin Guinney. “GSVA: gene set variation analysis for microarray and RNA-seq data.” BMC bioinformatics 14 (2013): 1-15.
- Petitprez, Florent, et al. “The murine Microenvironment Cell Population counter method to estimate abundance of tissue-infiltrating immune and stromal cell populations in murine samples using gene expression.” Genome medicine 12 (2020): 1-15.
- Almeida-Silva, Fabricio, and Thiago M. Venancio. “BioNERO: an all-in-one R/Bioconductor package for comprehensive and easy biological network reconstruction.” Functional & Integrative Genomics 22.1 (2022): 131-136.
- Ritchie, Matthew E., et al. “limma powers differential expression analyses for RNA-sequencing and microarray studies.” Nucleic acids research 43.7 (2015): e47-e47.
- Kanehisa, Minoru, et al. “KEGG: biological systems database as a model of the real world.” Nucleic Acids Research 53.D1 (2025): D672-D677.
- Milacic, Marija, et al. “The reactome pathway knowledgebase 2024.” Nucleic acids research 52.D1 (2024): D672-D678.
- Agrawal, Ayushi, et al. “WikiPathways 2024: next generation pathway database.” Nucleic acids research 52.D1 (2024): D679-D689.
- Szklarczyk, Damian, et al. “The STRING database in 2023: protein–protein association networks and functional enrichment analyses for any sequenced genome of interest.” Nucleic acids research 51.D1 (2023): D638-D646.
Results
Experimental Design and Sample Overview
The analysis included a total of 14 transcriptomic samples representing three experimental groups: 1) melanoma (n=6); 2) melanoma supplemented with Bifidobacterium adolescentis 150 (n=6); and 3) melanoma supplemented with Lacticaseibacillus rhamnosus K32 (n=2) (Table 1). The study was conducted in two sequential replicates: the first replicate comprised only the melanoma and B. adolescentis-supplemented groups (n=8), while the second replicate incorporated all three experimental conditions (n=6). The genomes of the bacteria used in this experiment are available at the following links GCF_001010915.1 and GCF_000735255.1. For clarity in data interpretation, we introduced the following standardized codings:
Experimental groups:
Melanoma –>
MMelanoma with B. adolescentis supplement –>
M_BIFMelanoma with L. rhamnosus supplement –>
M_LAC
Experimental batch:
First experiment –>
batch_1Second experiment –>
batch_2
Quality control
Following quality filtering with fastp, aggregated quality metrics were compiled using MultiQC to summarize the FastQC reports. As illustrated in Figure 1, the MultiQC quality profile indicates that the sequencing data maintained adequate overall quality. Post-filtering, approximately 198 million reads (mean ± SD: 14 ± 5 million reads per sample) were retained for downstream analysis, with an average of 16 ± 7% of reads per sample removed during quality control (see Table 2 for detailed filtering statistics). Transcript abundance was quantified using the HTSeq tool. Figure 3 displays the distribution of HTSeq counts across all experimental samples. The resulting gene expression count matrix was normalized using the median of ratios method, with normalized values presented in Table 4.
MiltiQC report
Reads filtering by quality statistic
Mapping reads to reference counts
Saving 5 x 7 in image

Normalized counts table
Principal component analysis
Principal component analysis (PCA) was employed as a linear dimensional reduction technique for exploratory data analysis and visualization. Prior to PCA, the expression data underwent variance-stabilizing transformation (VST), which derives stable variance estimates from fitted dispersion-mean relationships. This transformation processes count data normalized by size factors, generating a matrix of approximately homeostatic values characterized by constant variance across the range of mean expression levels, while simultaneously accounting for library size differences. The transformed data were visualized through bi-dimensional projection of the first two principal components (Figure 4), with the proportion of explained variance for each component detailed in Figure 5. Permutation multivariate analysis of variance (PERMANOVA) revealed statistically significant associations between expression profiles and batch effects, while no significant relationships with experimental groups were detected.
Saving 5 x 4 in image

Saving 5 x 4 in image

Permutation test for adonis under reduced model
Marginal effects of terms
Permutation: free
Number of permutations: 9999
adonis2(formula = t(vsd) ~ batch + group, data = meta_data, permutations = 9999, method = "euclidean", by = "margin")
Df SumOfSqs R2 F Pr(>F)
batch 1 9421 0.19069 3.7386 0.0086 **
group 2 7784 0.15756 1.5445 0.1325
Residual 10 25199 0.51007
Total 13 49404 1.00000
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Differential expression analysis
Differential expression analysis was conducted using the DESeq2 package, employing the model formula ~ batch + group to account for batch-related dispersion effects. The analysis revealed positive log2 fold change (lfc) values associated with both B. adolescentis 150 and L. rhamnosus K32 supplementation groups, while negative values corresponded to the untreated melanoma control group. DESeq2 performs differential expression analysis by fitting generalized linear models to negative binomial distributed counts and evaluating linear contrasts between experimental groups. Our results demonstrated 3 significantly up-regulated and 16 down-regulated genes in the M_BIF group compared to the M control group. For the M_LAC comparison, 21 differentially expressed genes were identified relative to the M group, while no significant changes were observed in the reverse comparison. When directly comparing bacterial supplementation groups, 39 genes were associated with M_BIF and 116 with M_LAC. Differentially expressed genes were determined using stringent thresholds (adjusted p-value < 0.05, |lfc| > 1). Complete statistical results of the differential analysis are presented in Tables 4-6. Visual representation of the results through volcano plots (Figures 6-8) effectively illustrates the relationship between statistical significance (-log10 p-value) and magnitude of expression changes (lfc) across all comparisons.
M_BIF vs M

M_LAC vs M

M_BIF vs M_LAC

Gene set enrichment analysis
Gene Set Enrichment Analysis (GSEA) was employed to evaluate whether predefined gene sets exhibited statistically significant and concordant differences between biological states (e.g., phenotypic groups). This computational approach was implemented using multiple databases, enabling comprehensive identification of biological functions associated with the experimental conditions. For this analysis, we specifically utilized the HALLMARK MSigDB gene sets. The complete GSEA results are presented in Figure 9 and Tables 7-9. To delineate cell-type-specific expression patterns, we performed marker analysis using the Mouse Cell Marker 2.0 database, with results visualized in Figure 10 and detailed in Tables 10-12. Complementary to this, Gene Set Variation Analysis (GSVA) was conducted to examine HALLMARK MSigDB gene sets in a sample-oriented manner (Figures 11-16). The investigation was extended to immune cell profiling through two complementary approaches: mMCP-counter was implemented for broad immune cell type quantification (Figure 17), while GSVA was specifically applied to assess critical immune parameters including M1/M2 macrophage polarization ratios and CD8+ T cell abundance (Figures 18-21).
MSigDb Hallmark
M_BIF vs M
M_LAC vs M
M_BIF vs M_LAC

Cell Marker 2.0
M_BIF vs M
M_LAC vs M
M_BIF vs M_LAC

Co-expression analysis
Gene co-expression analysis was performed to identify functionally related gene clusters and investigate their potential associations with phenotypic traits. Prior to analysis, technical batch effects were removed from the expression matrix using the removeBatchEffect function implemented in the limma package. PCA of the batch-corrected data showed similar sample clustering patterns to the uncorrected dataset (Figures 11 and 12), with PERMANOVA confirming the effective removal of batch effects while revealing no significant associations with experimental groups. The BioNERO package identified several co-expression modules showing differential associations with experimental conditions. Notably, the greenyellow and sienna3 modules demonstrated positive correlations with the untreated melanoma group, while the pink and royalblue modules showed negative associations (Figures 13-16, Tables 13 and 14). Functional characterization through over-representation analysis revealed significant biological pathways enriched in these modules (Tables 15 and 16). Hub gene analysis identified key regulatory elements within each module (Tables 17 and 18), with correlation networks visualized for selected modules (Figure 17). Additional network characterization using STRING database highlighted potential functional relationships among module genes (Figures 18 and 22) with clustering (Figures 19 and 23) and enrichment (Figures 20, 21, 24, and 25) analysis.
Batch correction
Saving 5 x 4 in image

Saving 5 x 4 in image

Permutation test for adonis under reduced model
Marginal effects of terms
Permutation: free
Number of permutations: 9999
adonis2(formula = t(mat) ~ batch + group, data = meta_data, permutations = 9999, by = "margin")
Df SumOfSqs R2 F Pr(>F)
batch 1 0.0001345 0.03397 0.4480 0.9468
group 2 0.0009222 0.23285 1.5355 0.1214
Residual 10 0.0030029 0.75823
Total 13 0.0039604 1.00000
Net construction
Saving 8 x 5 in image

Saving 8 x 6 in image

Co-expressed modules linked to M group

Saving 6 x 12 in image

Over-representation analysis
Hub genes analysis and visualization

STRING analysis of modules

